-
Notifications
You must be signed in to change notification settings - Fork 540
[MRG] Bug on dual potential (constraint violation when some weights are 0) #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ncourty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice bug correction @rflamary. Maybe you can be more explicit in the doc why we need this fix (mostly because of the way EMD is coded in C++). In the longer term, should we plan to integrate this patch rather inside the C++ code ?
ot/lp/__init__.py
Outdated
| def estimate_dual_null_weights(alpha0, beta0, a, b, M): | ||
| r"""Estimate feasible values for 0-weighted dual potentials | ||
| The feasible values are computed efficiently bjt rather coarsely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but
ot/lp/__init__.py
Outdated
| \beta_j = \beta_j -v^b_j \quad \text{ if } b_j=0 \text{ and } v^b_j>0 | ||
| In the end the dual potential are centred using function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentials
are centered
ot/lp/__init__.py
Outdated
| :ref:`center_ot_dual`. | ||
| Note that all those updates do not change the objective value of the | ||
| solution but provide dual potential that do not violate the constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentials
ot/lp/__init__.py
Outdated
| beta0 : (nt,) numpy.ndarray, float64 | ||
| Target dual potential | ||
| a : (ns,) numpy.ndarray, float64 | ||
| Source histogram (uniform weight if empty list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
histogram -> distribution ?
weight -> weights
ot/lp/__init__.py
Outdated
| a : (ns,) numpy.ndarray, float64 | ||
| Source histogram (uniform weight if empty list) | ||
| b : (nt,) numpy.ndarray, float64 | ||
| Target histogram (uniform weight if empty list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
ot/lp/__init__.py
Outdated
| bsel = b != 0 | ||
|
|
||
| # compute dual constraints violation | ||
| Viol = alpha0[:, None] + beta0[None, :] - M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change the name of the variable here by something more explicit, e.g. constraints_violation ?
ot/lp/__init__.py
Outdated
| # compute dual constraints violation | ||
| Viol = alpha0[:, None] + beta0[None, :] - M | ||
|
|
||
| # Compute worst violation per line and columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worst -> largest
test/test_ot.py
Outdated
| np.testing.assert_almost_equal(cost1, log['cost']) | ||
| check_duality_gap(a, b, M, G, log['u'], log['v'], log['cost']) | ||
|
|
||
| viol = log['u'][:, None] + log['v'][None, :] - M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
ot/lp/__init__.py
Outdated
|
|
||
| def emd(a, b, M, numItermax=100000, log=False, dense=True): | ||
| def center_ot_dual(alpha0, beta0, a=None, b=None): | ||
| r"""Center dual OT potentials wrt theirs weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrt -> wrt.
ot/lp/__init__.py
Outdated
| r"""Center dual OT potentials wrt theirs weights | ||
| The main idea of this function is to find unique dual potentials | ||
| that ensure some kind of centering/fairness. It will help have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more explicit on the meaning of 'centering/fairness' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have -> having
|
Thank you @ncourty i took care of your comments. We now have a warning in the emd_c function and the reason why we need to find proper dual potential is discussed. |
I just discovered this important bug in the `ot.emd``function and it feels like opening pandoras's box.
I added in the fisrt commit a test that checks that the constraints are satisfied and we can see in the travis build that the test fails. This comes from the fact that the c++ emd solver is sparse in the sens that if a weight is 0 it will discard the bin and solve a smaller subproblem on the histograms that have non-zero weights. It works very well and is very efficient in practice bug it does not return proper dual variables because it sets the value for the bins with 0 weight to 0 which can lead to constraint violation in practice.
What I will do in this PR:
ot.emdshould work by default (return proper dual variables/subgradients)The post-processing is actually necessary when trying to estimate correct dual variable from the optimal sub-problem if we want to avoid bias in the resulting dual potentials (bnasically everyone at 0 on either the left or right potential).